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Ensifer medicae WSM1369 is an aerobic, motile, Gram-negative, non-spore-forming rod 
that can exist as a soil saprophyte or as a legume microsymbiont of Medicago. WSM1369 
was isolated in 1 993 from a nodule recovered from the roots of Medicago sphaerocarpos 
growing at San Pietro di Rudas, near Aggius in Sardinia (Italy). WSM1369 is an effective 
microsymbiont of the annual forage legumes M. polymorpha and M. sphaerocarpos. Here 
we describe the features of E. medicae WSM1 369, together with genome sequence infor- 
mation and its annotation. The 6,402,557 bp standard draft genome is arranged into 307 
scaffolds of 307 contigs containing 6,656 protein-coding genes and 79 RNA-only encoding 
genes. This rhizobial genome is one of 1 00 sequenced as part of the DOE Joi nt Genome In- 
stitute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA- 
RNB) project. 



Introduction 

One of the key nutritional constraints to plant 
growth and development is the availability of ni- 
trogen (N) in nutrient deprived soils [1]. Although 
the atmosphere consists of approximately 80% N, 
the overwhelming proportion of this is present in 
the form of dinitrogen (N 2 ) which is biologically 
inaccessible to most plants and other higher or- 
ganisms. Before the development of the Haber- 
Bosch process, the primary mechanism for con- 
verting atmospheric N2 into a bioaccessible form 
was via biological nitrogen fixation (BNF) [2]. In 
BNF, N2 is made available by specialized microbes 
that possess the necessary molecular machinery 
to reduce N2 into NH3. Some plants, most of which 
are legumes, have harnessed BNF by evolving 



symbiotic relationships with specific ISh-fixing mi- 
crobes (termed rhizobia) whereby the host plant 
houses the bacteria in root nodules, supplying the 
microsymbiont with carbon and in return receives 
essential reduced N-containing products [3]. 
When BNF is exploited in agriculture, some of this 
N2 fixed into plant tissues is ultimately released 
into the soil following harvest or senescence, 
where it can then be assimilated by subsequent 
crops. Compared to industrially synthesized N- 
based fertilizers, BNF is a low energy, low cost and 
low greenhouse-gas producing alternative and 
hence its application is crucial to increasing the 
environmental and economic sustainability of 
farming systems [4]. 
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Forage and fodder legumes play vital roles in sus- 
tainable farming practice, with approximately 110 
million ha under production worldwide [5], a sig- 
nificant proportion of which is made up by mem- 
bers of the genus Medicago. Ensifer meliloti and E. 
medioae are known to nodulate and fix N2 with 
Medicago spp [6], although they have differences in 
host specificity. While E. meliloti strains do not 
nodulate M. murex, nodulate but do not fix N2 with 
M. polymorpha and nodulate but fix very poorly 
with M. arabica [7,8], they are able to nodulate and 
fix N2 with Medicago species originating from alka- 
line soils including the perennial M. sativa and the 
annuals M. littoralis and M. tomato [9,10]. In con- 
trast, E medicae strains can nodulate and fix N2 
with annuals well adapted to acidic soils, such as M. 
murex, M. arabica and M. polymorpha [7,8]. 

The E. medicae strain WSM1369 was isolated from 
a nodule collected from M. sphaerocarpos growing 
at San Pietro di Rudas, near Aggius in Sardinia (Ita- 
ly). This strain nodulates and fixes N2 effectively 
with M. polymorpha and M. sphaerocarpos [8]. Like 
M. murex and M. polymorpha, M. sphaerocarpos is 
an annual species which is tolerant of low pH soils 
[11], with studies suggesting that it only establishes 
N2-fixing associations with E. medicae strains [8,9]. 
However, owing to a paucity of symbiotic infor- 
mation, it is not yet clear whether M. sphaerocarpos 
fixes N2 with a wide range of E. medicae strains or if 
this ability is restricted to a smaller set of E. 
medicae accessions. Therefore, genome sequences 
of E. medicae strains effective with M. 
sphaerocarpos will provide a valuable genetic re- 
source to further investigate the symbiotaxonomy 
of Med/cago-nodulating rhizobia and will further 
enhance the existing available genome data for 
Ensifer microsymbionts [12-15]. Here we present a 
summary classification and a set of general features 



for this microsymbiont together with a description 
of its genome sequence and annotation. 

Classification and features 

E. medicae WSM1369 is a motile, non-sporulating, 
non-encapsulated, Gram-negative rod in the order 
Rhizobiales of the class Alphaproteobacteria. The 
rod-shaped form varies in size with dimensions of 
approximately 0.25-0.5 |im in width and 1.0-1.5 
|im in length (Figure 1 Left and 1 Center). It is fast 
growing, forming colonies within 3-4 days when 
grown on TY agar [16] or half strength Lupin Agar 
(VzLA) [17] at 28°C. Colonies on VzLA are opaque, 
slightly domed and moderately mucoid with 
smooth margins (Figure 1 Right). 

Minimum Information about the Genome Se- 
quence (MIGS) is provided in Table 1. Figure 2 
shows the phylogenetic neighborhood of E. 
medicae WSM1369 in a 16S rRNA sequence based 
tree. This strain shares 100% sequence identity 
(over 1290 bp) to the 16S rRNA of E. medicae 
A321T and E. medicae WSM419 [13] and 99% se- 
quence identity (1362/1366 bp) to the 16S rRNA 
of E. meliloti Sml021 [12]. 

Symbiotaxonomy 

E. medicae strain WSM1369 was isolated in 1993 
from a nodule collected from the annual M. 
sphaerocarpos growing at San Pietro di Rudas, 
near Aggius, Sardinia in Italy (J. G. Howieson, pers. 
comm.). The site of collection was undulating 
grassland, with a soil derived from granite materi- 
als that had a depth of 20-40 cm and a pH of 6.0. 
The soil was a loamy-sand and Lathyrus and Trip- 
Hum, spp. grew in association with M. 
sphaerocarpos. WSM1369 forms nodules (Nod + ) 
and fixes N2 (Fix + ) with M. polymorpha and M. 
sphaerocarpos [8]. 




Figure 1. Images of Ensifer medicae WSM1369 using scanning (Left) and transmission (Center) electron microscopy 
and the appearance of colony morphology on half strength lupin agar (Right). 
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Table 1 . Classification and general features of Ensifer medicae WSM1 369 according 
to the MIGS recommendations [1 8] 



MIGS ID 


Property 


Term 


Evidence code 






Domain Bacteria 


TAS [19] 






Phylum Proteobacteria 


TAS [20] 






Class Alphaproteobacteria 


TAS [21,22] 




Current classification 


Order Rhizobiales 
Family Rhizobiaceae 
Genus Ensifer 
Species Ensifer medicae 
Strain WSM1369 


TAS [21,23] 
TAS [24,25] 
TAS [26-28] 
TAS [2 7] 
TAS [8] 




Gram stain 


Negative 


IDA 




Cell shape 


Rod 


IDA 




Motility 


Motile 


IDA 




Sporulation 


Non-sporulating 


NAS 




Temperature range 


Mesophile 


NAS 




Optimum temperature 


28°C 


IDA 




Salinity 


Non-halophile 


NAS 


MIGS-22 


Oxygen requirement 


Aerobic 


TAS [8] 




Carbon source 


Varied 


NAS 




Energy source 


Chemoorganotroph 


NAS 


MIGS-6 


Habitat 


Soil, root nodule, on host 


NAS 


MIGS-15 


Biotic relationship 


Free living, symbiotic 


TAS [8] 


MIGS-14 


Pathogenicity 


Non-pathogenic 


NAS 




Biosafety level 


1 


TAS [29] 




Isolation 


Root nodule 


TAS [8] 


MIGS-4 


Geographic location 


Sardinia, Italy 


TAS [8] 


MIGS-5 


Soil collection date 


28 April 1993 


IDA 


MIGS-4.1 


Longitude 


9.019167 


IDA 


MIGS-4.2 


Latitude 


40.971667 


IDA 


MIGS-4.3 


Depth 


0-1 0 cm 


IDA 


MIGS-4.4 


Altitude 


Not recorded 


IDA 



Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement 
(i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement 
(i.e., not directly observed for the living, isolated sample, but based on a generally 
accepted property for the species, or anecdotal evidence). These evidence codes are 
from the Gene Ontology project [30]. 
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57 



57 



72 



99 



Ensifer meliloti GVPV12 (Gi08912) 
Ensifer meliloti AK83 (Gc01 81 0)* 
Ensifer meliloti Sm 1 02 1 (Gc00059)* 
Ensifer meliloti SM1 1 (CP001830 Gc01686)* 
Ensifer meliloti Mlalz-1 (Gi08913) 
Ensifer meliloti RRI128 (Gi08915) 
Ensifer meliloti AK58 (Gi07577) 
Ensifer meliloti MVII-I (Gi08914) 
Ensifer meliloti CIAM1775 (Gi08844) 
Ensifer meliloti LMG 6133 T (X67222) 
Ensifer meliloti WSM1022 (Gi08916) 
— Ens/fer me/Z/of/ 4H41 (Gi0891 1 ) 

Ensifer medicae WSM419 (Gc00590)* 
Ensifer medicae WSM1369 (Gi08907) 
Ensifer medicae WSM1 1 1 5 (Gi08906) 
Ensifer medicae Di28 (Gi08905) 
Ens/fer medicae WSM244 (Gi08916) 
Ensifer medicae WSM4191 (Gi08903) 
Ensifer medicae A321 T (L39882) 
Ensifer arboris LMG 14919 (Gi08822) (syn: HAMBI 1552 T ) 
Ensifer saheli LMG 7837 T (X68390) 



94 



71 



51 



62 



Ens/fer kostiense LMG 1 9227 T (AM 1 81 748) 



Ensifer sp. TW10 (Gi08835) 
Ensifer sp. WSM1721 (Gi08904) 



76 



Ensifer chiapanecum ITTG S70 (EU286550) 

Ensifer mexicanum ITTG R7 T (DQ41 1930) 
Ens/fer terangae LMG 7834 T (X68388) 



h 



0.002 

Figure 2. Phylogenetic tree showing the relationship of Ensifer medicae WSM1369 (shown in bold print) to other 
Ensifer spp. in the order Rhizobiales based on aligned sequences of the 16S rRNA gene (1,290 bp internal region). 
All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using 
MEGA, version 5 [31]. The tree was built using the Maximum-Likelihood method with the General Time Reversible 
model [32]. Bootstrap analysis [33] with 500 replicates was performed to assess the support of the clusters. Type 
strains are indicated with a superscript T. Brackets after the strain name contain a DNA database accession number 
and/or a GOLD ID (beginning with the prefix G) for a sequencing project registered in GOLD [34]. Published ge- 
nomes are indicated with an asterisk. 
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Table 2. Genome sequencing project information for E. medicae WSM1369 



MIGS ID 


Property 


Term 


KA 1 ~i 1 


Finishing quality 


Standard draft 


KA I 1 Q 
Ml lu5>-z o 


Libraries used 


One lllumina fragment library 


KA 1 0 Q 


Sequencing platforms 


iiiumina nioeqzuuu 


/VI HuO-J 1 .Z 


sequencing coverage 


1 1 1 1 1 Y~f~K mil O l W 

1 1 1 um i na . oz i x 


\ A i /~c on 


Assemblers 


Velvet version 1.1.04; Allpaths-LG version r39750 


/VULo-jz 


Gene calling methods 


Prodig al 1 .4 




GenBank 


AL) U 3 UUU UU UUU 




GenBank release date 


AUgUSt z o, zU 1 i 




GOLD ID 


Gi08907 




NCBI project ID 


165337 




Database: IMG 


25132 371 56 




Project relevance 


Symbiotic N, fixation, agriculture 



Genome sequencing and annotation 

Genome project history 

This organism was selected for sequencing on the 
basis of its environmental and agricultural rele- 
vance to issues in global carbon cycling, alterna- 
tive energy production, and biogeochemical im- 
portance, and is part of the Community Sequenc- 
ing Program at the U.S. Department of Energy, 
Joint Genome Institute (JGI) for projects of rele- 
vance to agency missions. The genome project is 
deposited in the Genomes OnLine Database [34] 
and a standard draft genome sequence in IMG. Se- 
quencing, finishing and annotation were per- 
formed by the JGI. A summary of the project in- 
formation is shown in Table 2. 

Growth conditions and DNA isolation 

E. medicae WSM1369 was cultured to mid loga- 
rithmic phase in 60 ml of TY rich medium on a gy- 
ratory shaker at 28°C [35]. DNA was isolated from 
the cells using a CTAB (Cetyl trimethyl ammonium 
bromide) bacterial genomic DNA isolation method 
[36]. 

Genome sequencing and assembly 

The genome of Ensifer medicae WSM1369 was se- 
quenced at the Joint Genome Institute (JGI) using 
lllumina technology [37]. An lllumina standard 
shotgun library was constructed and sequenced 
using the lllumina HiSeq 2000 platform which 
generated 13,712,318 reads totaling 2,057 Mbp. 



All general aspects of library construction and se- 
quencing performed at the JGI can be found at the 
JGI user home [36]. All raw lllumina sequence data 
was passed through DUK, a filtering program de- 
veloped at JGI, which removes known lllumina 
sequencing and library preparation artifacts 
(Mingkun, L., Copeland, A. and Han, J., un- 
published). The following steps were then per- 
formed for assembly: (1) filtered lllumina reads 
were assembled using Velvet [38] (version 
1.1.04), (2) 1-3 Kbp simulated paired end reads 
were created from Velvet contigs using wgsim 
[39], (3) lllumina reads were assembled with sim- 
ulated read pairs using Allpaths-LG [40] (version 
r39750). Parameters for assembly steps were: 1) 
Velvet (velveth: 63 -shortPaired and velvetg: - 
veryclean yes -exportFiltered yes -mincontiglgth 
500 -scaffolding no-covcutoff 10) 2) wgsim (-e 0 - 
1 76 -2 76 -r 0 -R 0 -X 0) 3) Allpaths-LG 
(PrepareAllpathsInputs: P HRED64=1 PL0IDY=1 
FRAGCOVERAGE=125 JUMPCOVERAGE=25 
LONGJUMPCOV=50, RunAllpath-sLG: THREADS=8 
RUN=stdshredpairs TARGETS=standard 
VAPIWARNONLY=True OVERWRITE =True). The 
final draft assembly contained 307 contigs in 307 
scaffolds. The total size of the genome is 6.4 Mbp 
and the final assembly is based on 2,057 Mbp of 
lllumina data, which provides an average 321* 
coverage of the genome. 
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Genome annotation 

Genes were identified using Prodigal [41] as part 
of the DOE-JGI annotation pipeline [42]. The pre- 
dicted CDSs were translated and used to search 
the National Center for Biotechnology Information 
(NCBI) nonredundant database, UniProt, 
TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro 
databases. The tRNAScanSE tool [43] was used to 
find tRNA genes, whereas ribosomal RNA genes 
were found by searches against models of the ri- 
bosomal RNA genes built from SILVA [44]. Other 
no n- co ding RNAs such as the RNA components of 
the protein secretion complex and the RNase P 
were identified by searching the genome for the 
corresponding Rfam profiles using INFERNAL 



[45]. Additional gene prediction analysis and 
manual functional annotation was performed 
within the Integrated Microbial Genomes (IMG- 
ER) platform [46]. 

Genome properties 

The genome is 6,402,557 nucleotides with 61.13% 
GC content (Table 3) and comprised of 307 scaf- 
folds (Figure 3) of 307 contigs. From a total of 
6,735 genes, 6,656 were protein encoding and 79 
RNA only encoding genes. The majority of genes 
(74.14%) were assigned a putative function while 
the remaining genes were annotated as hypothet- 
ical. The distribution of genes into COGs functional 
categories is presented in Table 4. 



Table 3. Genome Statistics for Ensifer medicae WSM1 369 

Attribute Value % of Total 

Genome size (bp) 6,402,557 100.00 

DN A coding region (bp) 5,536,774 86.48 

DNA G+C content (bp) 3,913,921 61.13 

Number of scaffolds 307 

Number of contigs 307 

Total gene 6,735 100.00 

RNA genes 79 1.17 

rRNA operons 1 0.01 

Protein-coding genes 6,656 98.83 

Genes with function prediction 4,993 74.14 

Genes assigned to COGs 4,988 74.06 

Genes assigned Pfam domains 5,185 76.99 

Genes with signal peptides 508 7.54 

Genes coding transmembrane proteins 1,424 21.14 

CRISPR repeats 0 
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WSM1369 : A3C5DRAFT_scaffold_0.1 

i i i 



WSM1369 : A3C5DRAFT_scaffold_1.2 




WSM1369 : A3C5DRAFT scaffold 5.6 




WSM1369 : A3C5DRAFT_scaffold_6.7 



1 






1 III 1 1 




1 II HI 111 










1 





WSM1369 : A3C5DRAFT scaffold 23 



II I llll lip I I I M I ■ III ■ 



WSM1369 : A3C5DRAFT scaffold 3.4 



WSM1369 : A3C5DRAFT scaffold 4.5 



I 



i ■ 



COO Code 


COG Function Definition 




[A] 


RNA processing and modification 




n 


Chromatin structure and dynamics 


■ 


|C| 


Energy production and conversion 




[D] 


CM cycle control. CM division chromosome partitioning 


■ 


[E] 


Amino aod transport and metabolism 


■ 


[Fl 


Nucleotide transport and metabolism 




10] 


Carbohydrate transport and metabolism 


■ 


(H) 


Coenzyme transport and metaboksm 


■ 


[1] 


Lipid transport and metabolism 


■ 


M 


Translation nbosomal structure and biogenesis 


■ 


[K] 


Transcription 




M 


Replication, recombination and repair 


■ 


[M] 


Cefl warVmembrane/envetope biogenesis 




[N] 


; ■■' dI ! , 


■ 


[O] 


Posttranslational modification protein turnover chaperones 


■ 


(P) 


inorganic ion transport and metabolism 




[O) 


Secondary metabolites biosynthesis transport and catabotem 


■ 


[Rl 


General function prediction only 


■ 


[S] 


Function unknown 


■ 


m 


Signal transduction mechanisms 




M 


intracellular trafficking secretion and vesicular transport 




M 


Defense mechanisms 


■ 


[W| 


Extracellular structures 


■ 


m 


Nuclear structure 




n 


Cytoskeleton 


■ 


[NA] 


Not Assigned 



Figure 3. Graphical map of the genome of Ensifer medicae WSM1 369 showing the seven largest scaffolds. 
From bottom to the top of each scaffold: Genes on forward strand (color by COG categories as denoted by 
the IMG platform), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, sRNAs red, 
other RNAs black), GC content, GC skew. 
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Table 4. Number of protein coding genes of Ensifer medicae WSM1369 associated with the 
general COG functional categories. 



Code 


Value 


% age 


Description 


J 


193 


3.48 


Translation, ribosomal structure and biogenesis 


A 


0 


0.00 


RNA processing and modification 


K 


486 


8.77 


Transcription 


L 


2 75 


4.96 


Replication, recombination and repair 


B 


1 


0.02 


Chromatin structure and dynamics 


D 


40 


0.72 


Cell cycle control, mitosis and meiosis 


Y 


0 


0.00 


Nuclear structure 


V 


54 


0.97 


Defense mechanisms 


T 


241 


4.35 


Signal transduction mechanisms 


M 


267 


4.82 


Cell wall/membrane biogenesis 


N 


77 


1.39 


Cell motility 


Z 


0 


0.00 


Cytoskeleton 


W 


1 


0.02 


Extracellular structures 


U 


124 


2.24 


Intracellular trafficki ng and secretion 


O 


184 


3.32 


Posttranslational modification, protein turnover, chaperones 


c 


308 


5.56 


Energy production conversion 


G 


510 


9.21 


Carbohydrate transport and metabolism 


E 


613 


11.06 


Amino acid transport metabolism 


F 


108 


1.95 


Nucleotide transport and metabolism 


H 


196 


3.54 


Coenzyme transport and metabolism 


I 


193 


3.48 


Lipid transport and metabolism 


P 


280 


5.05 


Inorganic ion transport and metabolism 


Q 


158 


2.85 


Secondary metabolite biosynthesis, transport and catabolism 


R 


662 


11.95 


General function prediction only 


S 


569 


10.2 7 


Function unknown 




1,747 


25.94 


Not in COGS 
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